{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d860a88f-a209-4502-a326-df204702cdcd",
   "metadata": {},
   "source": [
    "# Iteration with Pandas\n",
    "\n",
    "## [Dr. Tirthajyoti Sarkar](https://www.linkedin.com/in/tirthajyoti-sarkar-2127aa7/), Fremont, CA\n",
    "\n",
    "As data scientists, all of us have been there.\n",
    "\n",
    "We are given a large Pandas DataFrame and asked to check some relationships between various fields in the columns - in a row-by-row fashion. It could be some logical operation or some sophisticated mathematical transformation on the raw data.\n",
    "\n",
    "Essentially, it is a **simple case of iterating over the rows of the DataFrame** and doing some processing at each iteration.\n",
    "\n",
    "However, it **may not be that simple in terms of choosing the most efficient method** of executing this type of task. You might already have a hunch that a simple for-loop is going to be quite inefficient for this iteration task. \n",
    "\n",
    "And, you will be absolutely right. So, what can we do to avoid such speed pitfall?\n",
    "\n",
    "This type of knowledge is critical to practice what we call [\"Productive Data Science\"](https://medium.com/productive-data-science/why-and-how-should-you-learn-productive-data-science-53377b473f37)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "1c6f700f-9cf9-4793-872a-7e60c6c5d67b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np, pandas as pd, matplotlib.pyplot as plt\n",
    "from tqdm import tqdm\n",
    "from time import time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "03e83313-dddd-4335-965e-e1c742795156",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>95</td>\n",
       "      <td>11</td>\n",
       "      <td>81</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>63</td>\n",
       "      <td>87</td>\n",
       "      <td>75</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>77</td>\n",
       "      <td>40</td>\n",
       "      <td>4</td>\n",
       "      <td>63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>40</td>\n",
       "      <td>60</td>\n",
       "      <td>92</td>\n",
       "      <td>64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>12</td>\n",
       "      <td>93</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A   B   C   D\n",
       "0  95  11  81  70\n",
       "1  63  87  75   9\n",
       "2  77  40   4  63\n",
       "3  40  60  92  64\n",
       "4   5  12  93  40"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.random.seed(101)\n",
    "df = pd.DataFrame(np.random.randint(0,100,size=(100000, 4)), \n",
    "                  columns=list('ABCD'),dtype=np.int16)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "cb40c948-489c-41ec-b454-b41dfb2689ba",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 100000 entries, 0 to 99999\n",
      "Data columns (total 4 columns):\n",
      " #   Column  Non-Null Count   Dtype\n",
      "---  ------  --------------   -----\n",
      " 0   A       100000 non-null  int16\n",
      " 1   B       100000 non-null  int16\n",
      " 2   C       100000 non-null  int16\n",
      " 3   D       100000 non-null  int16\n",
      "dtypes: int16(4)\n",
      "memory usage: 781.4 KB\n"
     ]
    }
   ],
   "source": [
    "df.info(memory_usage=\"deep\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "2b863004-8a6b-423a-81a8-e8dd77b4d2dd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>100000.000000</td>\n",
       "      <td>100000.000000</td>\n",
       "      <td>100000.000000</td>\n",
       "      <td>100000.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>49.562640</td>\n",
       "      <td>49.533820</td>\n",
       "      <td>49.477140</td>\n",
       "      <td>49.495560</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>28.885105</td>\n",
       "      <td>28.907816</td>\n",
       "      <td>28.789594</td>\n",
       "      <td>28.849037</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>25.000000</td>\n",
       "      <td>24.000000</td>\n",
       "      <td>25.000000</td>\n",
       "      <td>24.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>50.000000</td>\n",
       "      <td>50.000000</td>\n",
       "      <td>50.000000</td>\n",
       "      <td>50.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>75.000000</td>\n",
       "      <td>75.000000</td>\n",
       "      <td>74.000000</td>\n",
       "      <td>74.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>99.000000</td>\n",
       "      <td>99.000000</td>\n",
       "      <td>99.000000</td>\n",
       "      <td>99.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   A              B              C              D\n",
       "count  100000.000000  100000.000000  100000.000000  100000.000000\n",
       "mean       49.562640      49.533820      49.477140      49.495560\n",
       "std        28.885105      28.907816      28.789594      28.849037\n",
       "min         0.000000       0.000000       0.000000       0.000000\n",
       "25%        25.000000      24.000000      25.000000      24.000000\n",
       "50%        50.000000      50.000000      50.000000      50.000000\n",
       "75%        75.000000      75.000000      74.000000      74.000000\n",
       "max        99.000000      99.000000      99.000000      99.000000"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2fcaa5a5-e4a7-4417-a00a-f71e49de79fe",
   "metadata": {},
   "source": [
    "## Most inefficient _for_ loop"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "35667490-c0ce-49be-8005-5181d71d656f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time:32.22 seconds\n",
      "Count: 49769\n"
     ]
    }
   ],
   "source": [
    "count=0\n",
    "t1=time()\n",
    "for i in range(len(df)):\n",
    "    if df.iloc[i]['A']+df.iloc[i]['B'] > df.iloc[i]['C']+df.iloc[i]['D']:\n",
    "        count+=1\n",
    "t2=time()\n",
    "delt1 = round((t2-t1),2)\n",
    "print(f\"Time:{delt1} seconds\")\n",
    "print(\"Count:\",count)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7bf463cd-e887-4257-ad58-8f3cda024101",
   "metadata": {},
   "source": [
    "## Comparing `iterrows()` and `df.values` - 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "37ecedc8-fd98-4e6c-a0c3-14a5a2a7324e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time:6.91 seconds\n",
      "Count: 49769\n"
     ]
    }
   ],
   "source": [
    "count=0\n",
    "t1=time()\n",
    "for idx, row in df.iterrows():\n",
    "    if row['A']+row['B'] > (row['C']+row['D']):\n",
    "        count+=1\n",
    "t2=time()\n",
    "delt1 = round((t2-t1),2)\n",
    "print(f\"Time:{delt1} seconds\")\n",
    "print(\"Count:\",count)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "cc1c3e1d-6d2b-4f1b-908e-6af8a0a2ff8c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time:0.112 seconds\n",
      "Count: 49769\n"
     ]
    }
   ],
   "source": [
    "count=0\n",
    "t1=time()\n",
    "for row in df.values:\n",
    "    if row[0]+row[1] > (row[2]+row[3]):\n",
    "        count+=1\n",
    "t2=time()\n",
    "delt2 = round((t2-t1),3)\n",
    "print(f\"Time:{delt2} seconds\")\n",
    "print(\"Count:\",count)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "a09a19c8-df6c-4681-a40e-ba74006609c9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "df.values is 61.7 times faster\n"
     ]
    }
   ],
   "source": [
    "print(f\"df.values is {round(delt1/delt2,2)} times faster\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41979a9d-ee10-4fc4-a614-d75bd907bd27",
   "metadata": {},
   "source": [
    "## Comparing `iterrows()` and `df.values` - 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "6b5ccfa4-1a3d-462a-ac1a-14778ace0194",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time:8.05 seconds\n",
      "Count: 35886\n"
     ]
    }
   ],
   "source": [
    "count=0\n",
    "t1=time()\n",
    "for idx, row in df.iterrows():\n",
    "    if row['A']+row['B'] > 1.25*(row['C']+row['D']):\n",
    "        count+=1\n",
    "t2=time()\n",
    "delt1 = round((t2-t1),2)\n",
    "print(f\"Time:{delt1} seconds\")\n",
    "print(\"Count:\",count)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "34e52057-5ef4-4b14-997d-3ef3337d8ce7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time:0.546 seconds\n",
      "Count: 35886\n"
     ]
    }
   ],
   "source": [
    "count=0\n",
    "t1=time()\n",
    "for row in df.values:\n",
    "    if row[0]+row[1] > 1.25*(row[2]+row[3]):\n",
    "        count+=1\n",
    "t2=time()\n",
    "delt2 = round((t2-t1),3)\n",
    "print(f\"Time:{delt2} seconds\")\n",
    "print(\"Count:\",count)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "d76203cf-a0a1-40db-97cc-591cabd210a8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "df.values is 14.74 times faster\n"
     ]
    }
   ],
   "source": [
    "print(f\"df.values is {round(delt1/delt2,2)} times faster\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8f00b66c-e91c-4390-bb48-7edc9f785144",
   "metadata": {},
   "source": [
    "## Comparing `iterrows()` and `df.values` - 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "e619a3c6-712a-49a8-b47b-4eece6205e97",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time:8.76 seconds\n",
      "Count: 9202\n"
     ]
    }
   ],
   "source": [
    "count=0\n",
    "t1=time()\n",
    "for idx, row in df.iterrows():\n",
    "    if np.log(1+row['A']+row['B']) > np.sqrt(0.5*(row['C']+row['D'])):\n",
    "        count+=1\n",
    "t2=time()\n",
    "delt1 = round((t2-t1),2)\n",
    "print(f\"Time:{delt1} seconds\")\n",
    "print(\"Count:\",count)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "6163d0dc-4be7-4533-90a2-06d6a997fdb9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time:0.962 seconds\n",
      "Count: 9202\n"
     ]
    }
   ],
   "source": [
    "count=0\n",
    "t1=time()\n",
    "for row in df.values:\n",
    "    if np.log(1+row[0]+row[1]) > np.sqrt(0.5*(row[2]+row[3])):\n",
    "        count+=1\n",
    "t2=time()\n",
    "delt2 = round((t2-t1),3)\n",
    "print(f\"Time:{delt2} seconds\")\n",
    "print(\"Count:\",count)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "8821b345-847d-4d28-ba1a-0d05b887d22f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "df.values is 9.11 times faster\n"
     ]
    }
   ],
   "source": [
    "print(f\"df.values is {round(delt1/delt2,2)} times faster\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3c11a370-bd6b-4862-a628-536dfab3c09b",
   "metadata": {},
   "source": [
    "## Simple vectorized operation is fastest in this counting example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "cfe80cb8-e4d1-4f51-aaad-e71b02c479e3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time:0.01 seconds\n",
      "Count: 9202\n"
     ]
    }
   ],
   "source": [
    "t1 = time()\n",
    "df['result'] = np.log(1+df['A']+df['B']) > np.sqrt(0.5*(df['C']+df['D']))\n",
    "t2=time()\n",
    "delt3 = round((t2-t1),3)\n",
    "print(f\"Time:{delt3} seconds\")\n",
    "print(\"Count:\",df['result'].sum())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0709632-41cc-4e9f-8482-7b9cff0dc480",
   "metadata": {},
   "source": [
    "## String identifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "26d4fd9f-eaf6-4730-b6ba-ab353e677594",
   "metadata": {},
   "outputs": [],
   "source": [
    "def identifier():\n",
    "    \"\"\"\n",
    "    Generates random identifier string of 5 characters\n",
    "    \"\"\"\n",
    "    letters = list('CFJQZ')\n",
    "    numbers = list('123456789')\n",
    "    \n",
    "    random_id = ''\n",
    "    random_id+=np.random.choice(letters)\n",
    "    random_id+=np.random.choice(letters)\n",
    "    random_id+=np.random.choice(numbers)\n",
    "    random_id+=np.random.choice(numbers)\n",
    "    #random_id+=np.random.choice(numbers)\n",
    "    #random_id+=np.random.choice(numbers)\n",
    "    #random_id+=np.random.choice(letters)\n",
    "    random_id+=np.random.choice(letters)\n",
    "    \n",
    "    return random_id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "02d1f2f0-87d2-4125-8a0b-83382463bcb0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "JQ98C\n",
      "CC24Z\n",
      "ZQ25Q\n",
      "QC54C\n",
      "ZZ24Z\n",
      "JF91Z\n",
      "FQ89C\n",
      "FQ15F\n",
      "CZ59F\n",
      "ZC59F\n"
     ]
    }
   ],
   "source": [
    "for i in range(10):\n",
    "    print(identifier())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "2c5e1160-91cc-49ed-a541-2bb424d79cad",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "      <th>result</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>95</td>\n",
       "      <td>11</td>\n",
       "      <td>81</td>\n",
       "      <td>70</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>63</td>\n",
       "      <td>87</td>\n",
       "      <td>75</td>\n",
       "      <td>9</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>77</td>\n",
       "      <td>40</td>\n",
       "      <td>4</td>\n",
       "      <td>63</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>40</td>\n",
       "      <td>60</td>\n",
       "      <td>92</td>\n",
       "      <td>64</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>12</td>\n",
       "      <td>93</td>\n",
       "      <td>40</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A   B   C   D  result\n",
       "0  95  11  81  70   False\n",
       "1  63  87  75   9   False\n",
       "2  77  40   4  63   False\n",
       "3  40  60  92  64   False\n",
       "4   5  12  93  40   False"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "e6b372dd-9f5e-4028-9770-b171025a97a6",
   "metadata": {},
   "outputs": [],
   "source": [
    "id_lst=[]\n",
    "for i in range(100000):\n",
    "    id_lst.append(identifier())\n",
    "id_lst=np.array(id_lst)\n",
    "\n",
    "df.insert(0,'ID',id_lst)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "0bf94057-e44c-4fba-ae12-7daff8c587a8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "      <th>result</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>78891</th>\n",
       "      <td>ZF36F</td>\n",
       "      <td>19</td>\n",
       "      <td>64</td>\n",
       "      <td>85</td>\n",
       "      <td>67</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6152</th>\n",
       "      <td>FC69F</td>\n",
       "      <td>21</td>\n",
       "      <td>2</td>\n",
       "      <td>22</td>\n",
       "      <td>12</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99247</th>\n",
       "      <td>FC76F</td>\n",
       "      <td>3</td>\n",
       "      <td>77</td>\n",
       "      <td>78</td>\n",
       "      <td>32</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>63451</th>\n",
       "      <td>ZF94Q</td>\n",
       "      <td>30</td>\n",
       "      <td>95</td>\n",
       "      <td>10</td>\n",
       "      <td>12</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40941</th>\n",
       "      <td>QJ46F</td>\n",
       "      <td>14</td>\n",
       "      <td>4</td>\n",
       "      <td>95</td>\n",
       "      <td>40</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          ID   A   B   C   D  result\n",
       "78891  ZF36F  19  64  85  67   False\n",
       "6152   FC69F  21   2  22  12   False\n",
       "99247  FC76F   3  77  78  32   False\n",
       "63451  ZF94Q  30  95  10  12    True\n",
       "40941  QJ46F  14   4  95  40   False"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sample(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "bbd50364-66b3-4703-99bf-a2e802772c04",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "10125"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['ID'].nunique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "23834f67-2c42-42a0-b533-3fdae891da5c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time:6.597 seconds\n"
     ]
    }
   ],
   "source": [
    "ratio_dict={'ID':[],'Ratio':[]}\n",
    "t1=time()\n",
    "for _,row in df.iterrows():\n",
    "    if row['ID'][0:2] == 'ZZ' and row['ID'][-1]=='F':\n",
    "        ratio = (row['A']+row['B'])/(0.01+row['C']+row['D'])\n",
    "        ratio_dict['ID'].append(row[0])\n",
    "        ratio_dict['Ratio'].append(ratio)\n",
    "t2=time()\n",
    "delt4 = round((t2-t1),3)\n",
    "print(f\"Time:{delt4} seconds\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "4901d6f8-646c-4bcb-aedc-7502e69c02f3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time:0.056 seconds\n"
     ]
    }
   ],
   "source": [
    "ratio_dict={'ID':[],'Ratio':[]}\n",
    "t1=time()\n",
    "for row in df.values:\n",
    "    if row[0][0:2] == 'ZZ' and row[0][-1]=='F':\n",
    "        ratio = (row[1]+row[2])/(0.01+row[3]+row[4])\n",
    "        ratio_dict['ID'].append(row[0])\n",
    "        ratio_dict['Ratio'].append(ratio)\n",
    "t2=time()\n",
    "delt4 = round((t2-t1),3)\n",
    "print(f\"Time:{delt4} seconds\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "a40af042-c101-43b6-a3f2-c8648c072672",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th>Ratio</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>ZZ43F</td>\n",
       "      <td>0.349377</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>ZZ79F</td>\n",
       "      <td>1.285600</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>ZZ57F</td>\n",
       "      <td>1.090744</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>ZZ17F</td>\n",
       "      <td>0.862388</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>ZZ56F</td>\n",
       "      <td>1.134152</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>822</th>\n",
       "      <td>ZZ34F</td>\n",
       "      <td>0.888724</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>823</th>\n",
       "      <td>ZZ22F</td>\n",
       "      <td>0.767811</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>824</th>\n",
       "      <td>ZZ59F</td>\n",
       "      <td>2.063838</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>825</th>\n",
       "      <td>ZZ47F</td>\n",
       "      <td>0.608283</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>826</th>\n",
       "      <td>ZZ71F</td>\n",
       "      <td>1.550388</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>827 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        ID     Ratio\n",
       "0    ZZ43F  0.349377\n",
       "1    ZZ79F  1.285600\n",
       "2    ZZ57F  1.090744\n",
       "3    ZZ17F  0.862388\n",
       "4    ZZ56F  1.134152\n",
       "..     ...       ...\n",
       "822  ZZ34F  0.888724\n",
       "823  ZZ22F  0.767811\n",
       "824  ZZ59F  2.063838\n",
       "825  ZZ47F  0.608283\n",
       "826  ZZ71F  1.550388\n",
       "\n",
       "[827 rows x 2 columns]"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame(ratio_dict)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
